remotemanager.dataset.dataset module

Main Dataset module

This is the primary class used by the user

class remotemanager.dataset.dataset.Dataset(function: Callable | str | None, url: URL | None = None, dbfile: str | None = None, transport: Transport | None = None, serialiser: serial | None = None, script: str | None = None, shebang: str | None = None, name: str | None = None, extra_files_send: List[str] | str | None = None, extra_files_recv: List[str] | str | None = None, verbose: int | bool | Verbosity | None = None, run_summary_limit: int = 25, add_newline: bool = True, skip: bool = True, extra: str | None = None, **global_run_args)[source]

Bulk holder for remote runs. The Dataset class handles anything regarding the runs as a group. Running, retrieving results, sending to remote, etc.

Parameters:
  • function (Callable, str, None) – Function to run. Can either be the function object, source string or None If None, Runner will pass arguments to the script method

  • url (URL) – connection to remote (optional)

  • transport (Transport) – transport system to use, if a specific is required. Defaults to transport.rsync

  • serialiser (serial) – serialisation system to use, if a specific is required. Defaults to serial.serialjson

  • script (str) – callscript required to run the jobs in this dataset

  • submitter (str) – command to exec any scripts with. Defaults to “bash”

  • name (str) – optional name for this dataset. Will be used for runscripts

  • extra_files_send (list, str) – extra files to send with this run

  • extra_files_recv (list, str) – extra files to retrieve with this run

  • skip (bool) – skip dataset creation if possible. Defaults True

  • extra – extra text to insert into the runner jobscripts

  • global_run_args – any further (unchanging) arguments to be passed to the runner(s)

default_url

a default url can be assigned to all Datasets.

Type:

URL

property verbose: Verbosity

Verbose property

classmethod recreate(*args, raise_if_not_found: bool = True, **kwargs)[source]

Attempts to extract a dataset matching the given args from the python garbage collection interface

Parameters:
  • raise_if_not_found (bool) – raise ValueError if the Dataset was not found

  • *args – args as passed to Dataset

  • **kwargs – keyword args as passed to Dataset

Returns:

Dataset

classmethod from_file(file: str, url: URL | None = None)[source]

Alias for Dataset.unpack(file=…)

Parameters:
  • file (str) – Dataset dbfile

  • url (URL) – the URL to apply to this Dataset

Returns:

unpacked Dataset

Return type:

(Dataset)

property database: Database

Access to the stored database object. Creates a connection if none exist.

Returns (Database):

Database

property dbfile: str

Name of the database file

sanitise_run_arg_paths(run_args: dict) dict[source]

Checks for issues in the paths within the given run_args

static sanitise_path(path) str[source]

Ensures a clean unix-type path

property remote_dir: str

Accesses the remote_dir property from the run args. Tries to fall back on run_dir if not found, then returns default as a last resort.

property run_dir: str | None

Accesses the remote_dir property from the run args. Tries to fall back on run_dir if not found, then returns default as a last resort.

property run_path: str | bool

Accesses the remote_dir property from the run args. Tries to fall back on run_dir if not found, then returns default as a last resort.

property local_dir: str

Accesses the local_dir property from the run args. Returns default if not found.

property repo_prefix: str

override for repo names and manifest file in a dependency situation

property repofile: TrackedFile

Returns the TrackedFile instance responsible for the repository

property bash_repo: TrackedFile

Returns the TrackedFile instance responsible for the repository

property master_script: TrackedFile

Returns the TrackedFile instance responsible for the master script

property manifest_log: TrackedFile

Returns the TrackedFile instance responsible for the manifest

property global_run_args: dict

Returns the toplevel global run args

set_run_arg(key: str, val)[source]

Set a single run arg key to val

Parameters:
  • key – name to set

  • val – value to set to

Returns:

None

set_run_args(keys: list, vals: list)[source]

Set a list of keys to `vals

Note

List lengths must be the same

Parameters:
  • keys – list of keys to set

  • vals – list of vals to set to

Returns:

None

update_run_args(d: dict)[source]

Update current global run args with a dictionary d

Parameters:

d – dict of new args

Returns:

None

property do_not_recurse: bool

Internal function used for blocking recursion in dependency calls

property dependency: Dependency | None

Returns the stored dependency

property is_child: bool

Returns True if this dataset is a child, False otherwise

property is_parent: bool

Returns True if this dataset is a parent, False otherwise

set_downstream(dataset) None[source]

Add a child to this dataset

set_upstream(dataset) None[source]

Add a parent to this dataset

pack(file: str = None, **kwargs) dict | None[source]

Override for the SendableMixin.pack() method, ensuring the dataset is always below a uuid

Parameters:

**kwargs – Any arguments to be passed onwards to the SendableMixin.pack()

Returns:

(dict) packing result

update_db(dependency_call: bool = False) None[source]

Force updates the database

set_run_option(key: str, val) None[source]

Update a global run option key with value val

Parameters:
  • key (str) – option to be updated

  • val – value to set

append_run(args: dict = None, arguments: dict = None, name: str = None, extra_files_send: list | str | None = None, extra_files_recv: list | str | None = None, dependency_call: bool = False, verbose: int = None, quiet: bool = False, skip: bool = True, force: bool = False, lazy: bool = False, chain_run_args: bool = True, extra: str = None, return_runner: bool = False, **run_args)[source]

Serialise arguments for later runner construction

Parameters:
  • args (dict) – dictionary of arguments to be unpacked

  • arguments (dict) – alias for args

  • name (str) – append a runner under this name

  • extra_files_send (list, str) – extra files to send with this run

  • extra_files_recv (list, str) – extra files to retrieve with this run

  • dependency_call (bool) – True if called via the dependency handler

  • verbose (int, Verbose, None) – verbose level for this runner (defaults to Dataset level)

  • quiet (bool) – disable printing for this append if True

  • skip (bool) – ignores checks for an existing runner if set to False

  • force (bool) – always appends if True

  • lazy (bool) – performs a “lazy” append if True, skipping the dataset update. You MUST call ds.finish_append() after you are done appending to avoid strange behaviours

  • chain_run_args (bool) – for dependency runs, will not propagate run_args to other datasets in the chain if False (defaults True)

  • extra – extra string to add to this runner

  • return_runner – returns the appened (or matching) runner if True

  • run_args – any extra arguments to pass to runner

insert_runner(runner: Runner, skip: bool = True, force: bool = False, lazy: bool = False, verbose: None | int | bool | Verbosity = None, quiet: bool = False, return_runner: bool = False) None | Runner[source]

Internal runner insertion.

Parameters:
  • runner – Runner object to insert

  • skip – don’t insert if it exists

  • force – force inserts

  • lazy – Attempts a lazy append if True (does not update DB)

  • verbose – Verbosity level for this runner

  • quiet – inserts runner quietly if True

  • return_runner – Returns the runner object if True

Returns:

None or Runner

finish_append(dependency_call: bool = False, print_summary: bool = True, verbose: None | int | bool | Verbosity = None) None[source]

Completes the append process by updating the database, and printing a summary if necessary

Parameters:
  • dependency_call – Will not attempt to relay to a dependency if True (called by dependency)

  • print_summary – Prints a summary if True

  • verbose – verbosity level for this call

lazy_append() LazyAppend[source]

Access a LazyAppend object, which handles the append finalisation

copy_runners(dataset: Dataset) None[source]

Copy the runners from dataset over to this dataset

remove_run(ident: int | str | dict, dependency_call: bool = False, verbose: None | int | bool | Verbosity = None) bool[source]

Remove a runner with the given identifier. Search methods are identical get_runner(id)

Parameters:
  • ident – identifier

  • dependency_call (bool) – used by any dependencies that exist, prevents recursion

  • verbose – local verbose level

Returns:

True if succeeded

Return type:

(bool)

get_runner(ident: int | str | dict, dependency_call: bool = False, verbose: None | int | bool | Verbosity = None) Runner | None[source]

Collect a runner with the given identifier. Depending on the type of arg passed, there are different search methods:

  • int: the runners[ident] of the runner to remove

  • str: searches for a runner with the matching uuid

  • dict: attempts to find a runner with matching args

Parameters:
  • ident – identifier

  • dependency_call (bool) – used by the dependencies, runners cannot be removed via uuid in this case, as the uuids will not match between datasets

Returns:

collected Runner, None if not available

Return type:

(Runner)

wipe_runs(dependency_call: bool = False, confirm: bool = True) None[source]

Removes all runners

Parameters:
  • dependency_call (bool) – used by any dependencies that exist, prevents recursion

  • confirm (bool) – Asks for confirmation if True

reset_runs(wipe: bool = False, dependency_call: bool = False, confirm: bool = True) None[source]

Remove any results from the stored runners and attempt to delete their result files if wipe=True

Warning

This is a potentially destructive action, be careful with this method

Parameters:
  • wipe – Additionally deletes the local files if True. Default False

  • dependency_call (bool) – used by any dependencies that exist, prevents recursion

  • confirm (bool) – Asks for confirmation if True

collect_files(remote_check: bool, results_only: bool = False, extra_files_send: bool = True) list[source]

Collect created files

Parameters:
  • remote_check – search for remote paths if True

  • results_only – only collect files that are returned from a run such as Results and extra_files_recv if True

  • extra_files_send – collects extra_files_send if True

Returns:

list of filepaths

wipe_local(files_only: bool = True, dry_run: bool = False, dependency_call: bool = False, confirm: bool = True) None[source]

Clear out the local directory

Parameters:
  • files_only (bool) – delete individual files instead of whole folders (preserves extra files)

  • dry_run (bool) – print targets and exit

  • dependency_call (bool) – used by any dependencies that exist, prevents recursion

  • confirm (bool) – Asks for confirmation if True

wipe_remote(files_only: bool = True, dry_run: bool = False, dependency_call: bool = False, confirm: bool = True) None[source]

Clear out the remote directory (including run dir)

Parameters:
  • files_only (bool) – delete individual files instead of whole folders (preserves extra files)

  • dry_run (bool) – print targets and exit

  • dependency_call (bool) – used by any dependencies that exist, prevents recursion

  • confirm (bool) – Asks for confirmation if True

hard_reset(files_only: bool = True, dry_run: bool = False, dependency_call: bool = False, confirm: bool = True) None[source]

Hard reset the dataset, including wiping local and remote folders

Parameters:
  • files_only (bool) – delete individual files instead of whole folders (preserves extra files)

  • dry_run (bool) – print targets and exit

  • dependency_call (bool) – used by any dependencies that exist, prevents recursion

  • confirm (bool) – Asks for confirmation if True

backup(file=None, force: bool = False, full: bool = False) str[source]

Backs up the Dataset and any attached results/extra files to zip file

Parameters:
  • file – target path

  • force – overwrite file if it exists

  • full – only collects runner results if False (defaults False)

Returns:

path to zip file

classmethod restore(file, force: bool = False) Dataset[source]

Restore from backup file file

Parameters:
  • file – File to restore from

  • force – Set to True to overwrite any existing Dataset

Returns:

Dataset

property runner_dict: dict

Stored runners in dict form, where the keys are the append id

property runners: List[Runner]

Stored runners as a list

property states: List[RunnerState]

Runner states as a list of RunnerState

property string_states: List[str]

Runner states as a list of strings

property function: Function | Script | None

Currently stored Function wrapper

property extra: str

Returns the global level extra

property shebang: str

returns the url shebang

property script: str

Currently stored run script

Parameters:

sub_args – arguments to substitute into the script() method

Returns:

arg-substituted script

Return type:

(str)

property add_newline: bool

Returns True if add_newline is set

This controls if scripts have an additional newline enforced at the end

property submitter: str

Currently stored submission command

property url: URL

Currently stored URL object

property transport: Transport

Currently stored Transport system

property serialiser: serial

Returns the stored serialiser object

remove_database() None[source]

Deletes the database file

property name: str

Name of this dataset

property uuid: str

This Dataset’s full uuid (64 characcter)

property short_uuid: str

This Dataset’s short format (8 character) uuid

set_runner_states(state: str, uuids: list = None, extra: str = None, force: bool = False) None[source]

Update runner states to state

Parameters:
  • state ((str)) – state to set

  • uuids ((list)) – list of uuids to update, updates all if not passed

check_all_runner_states(state: str) bool[source]

Check all runner states against state, returning True if all runners have this state

Parameters:

state (str) – state to check for

Returns (bool):

all(states)

property last_run: int | None

Returns the unix time of the last _run call

Returns:

unix time of last _run call, or None if impossible

Return type:

(int)

property run_summary_limit: int

If there are more runners than this number, the run output will be summed up rather than printed

property summary_only: bool

Returns True if the number of runners exceeds the summary limit. Otherwise, returns False.

Used for printing a shortened output when running.

retry_failed(*args, **kwargs) None[source]

Retries all failed runners

Takes args and kwargs, passes them to run

stage(uuids: List[str] = None, force: bool = False, dependency_call: bool = False, extra: str = '', force_ignores_success: bool = False, verbose: Verbosity = None, **run_args) bool[source]

Stage all runners, generating all files and preparing for transfer and execution.

Returns a boolean, True if any new content was written.

transfer(uuids: List[str] = None, force: bool = False, dependency_call: bool = False, extra: str = '', force_ignores_success: bool = False, verbose: Verbosity = None, **run_args) bool[source]

Transfer the files to the remote

run(force: bool = False, dry_run: bool = False, verbose: None | int | bool | Verbosity = None, uuids: list = None, extra: str = '', force_ignores_success: bool = False, dependency_call: bool = False, **run_args) bool[source]

Run the functions

Parameters:
  • force (bool) – force all runs to go through, ignoring checks

  • dry_run (bool) – create files, but do not run

  • verbose – Sets local verbose level

  • uuids (list) – list of uuids to run

  • extra – extra text to add to runner jobscripts

  • failed_only (bool) – If True, force will submit only failed runners

  • force_ignores_success (bool) – If True, force takes priority over is_success check

  • dependency_call (bool) – Internally used to block recursion issues with dependencies

  • run_args – any arguments to pass to the runners during this run. will override any “global” arguments set at Dataset init

property run_cmd: CMD

Access to the storage of CMD objects used to run the scripts

Returns:

List of CMD objects

Return type:

(list)

check_states(state: str) dict[source]

Call the repo “last_time” method remotely

check_started() dict[source]

Check when runners started remotely, using the manifest

property is_finished: list

Queries the finished state of this Dataset

property is_finished_force: list

Queries the finished state of this Dataset

property all_finished: bool

Check if all runners have finished

Returns (bool):

True if all runners have completed their runs

property all_success: bool

Returns True if all runners report that they have succeeded

wait(interval: int | float = 10, timeout: int | float = None, watch: bool = False, success_only: bool = False, only_runner: Runner = None, force: bool = False) None[source]

Watch the calculation, printing updates as runners complete

Parameters:
  • interval – check interval time in seconds

  • timeout – maximum time to wait in seconds

  • watch – print an updating table of runner states

  • success_only – Completion search ignores failed runs if True

  • only_runner – wait for only this runner to complete

  • force – Raises dataset level errors as errors if True

Returns:

None

fetch_results(results: bool = True, errors: bool = True, extras: bool = True, force: bool = False, verbose: None | int | bool | Verbosity = None)[source]

Fetch results from the remote, and store them in the runner results property

Parameters:
  • results – fetch result files

  • errors – fetch error files

  • extras – fetch extra files

Returns:

None

update_runners(runners: list | None = None, dependency_call: bool = False)[source]

Collects the manifest file, updating runners

Parameters:
  • runners – list of runners to update, usually used for dependencies

  • dependency_call – internal flag to avoid dependecy loops

property results: list

Access the results of the runners

Returns (list):

runner.result for each runner

property errors: list

Access the errors of the runners

Returns (list):

runner.error for each runner

property failed: list

Returns a list of failed runners

Returns:

list of failed runners

prepare_for_transfer() None[source]

Ensures that the Transport class is able to function

avoid_runtime() None[source]

Call for last_runtime sensitive operations such as is_finished and fetch_results

Waits for 1s if we’re too close to the saved _last_run time

Returns:

None

remotemanager.dataset.dataset.line_starts_with_uuid(line: str) bool[source]

Checks if line starts with a short uuid

returns True if line starts like “a1b2c3d4”, False otherwise